Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

POEM: 1-Bit Point-Wise Operations Based on E-M for Point Cloud Processing

163

Algorithm 12 POEM training. L is the loss function (summation of LS and LR) and N

is the number of layers. Binarize() binarizes the ﬁlters obtained using the binarization Eq.

6.36, and Update() updates the parameters according to our update scheme.

Input: a minibatch of inputs and their labels, unbinarized weights w, scale factor α,

learning rates η.

Output:

updated

unbinarized

weights

w^t⁺¹,

updated

scale

factor

α^t⁺¹.

1: {1. Computing gradients with aspect to the parameters:}

2: {1.1. Forward propagation:}

3: for i =1 to N do

b^wⁱ←Binarize(wi) (using Eq. 6.36)

Bi-FC features calculation using Eq. 6.87 – 6.72

Loss calculation using Eq. 6.88 – 6.44

7: end for

8: {1.2. Backward propagation:}

9: for i =N to 1 do

10:

{Note that the gradients are not binary.}

11:

Computing δw using Eq. 6.89 – 6.59

12:

Computing δα using Eq. 6.60 – 6.62

13:

Computing δp using Eq. 6.63 – 6.64

14: end for

15: {Accumulating the parameters gradients:}

16: for i = 1 to N do

17:

w^t⁺¹←Update(δw, η) (using Eq. 6.89)

18:

α^t⁺¹←Update(δα, η) (using Eq. 6.61)

19:

p^t⁺¹←Update(δw, η) (using Eq. 6.64)

20:

η^t⁺¹←Update(η) according to learning rate schedule

21: end for

Then, we optimize w^j

i ^as

δwj

i ⁼^∂L^S

∂w^j

+ λ^∂L^R

∂w^j

+ τEM(w^j

i ⁾^,

(6.58)

where τ is the hyperparameter to control the proportion of the Expectation-Maximization

operator EM(w^j

i ^).^EM⁽^w^j

i ^{) is deﬁned as}

EM(w^j

i ^{) =}

k=1 ^ˆ^ξ^jk

i ^(ˆ^μ^k

i ⁻^w^j

i ⁾^,

ˆμ¹

i ^{< w}^j

i ^<^ˆ^μ²

else

(6.59)

Updating αi: We further update the scale factor αi with wi ﬁxed. δαi is deﬁned as the

gradient of αi, and we have

δαi = ^∂L^S

∂αi

+ λ^∂L^R

∂αi

(6.60)

αi ←|αi −ηδαi|,

(6.61)

where η is the learning rate. The gradient derived from softmax loss can be easily calculated

on the basis of backpropagation. Based on Eq. 6.44, we have

∂LR

∂αi

= (wi −αi ◦b^wⁱ) · b^wⁱ.

(6.62)